974 research outputs found
Implicit surfaces with globally regularised and compactly supported basis functions
We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data
Sensitive and Scalable Online Evaluation with Theoretical Guarantees
Multileaved comparison methods generalize interleaved comparison methods to
provide a scalable approach for comparing ranking systems based on regular user
interactions. Such methods enable the increasingly rapid research and
development of search engines. However, existing multileaved comparison methods
that provide reliable outcomes do so by degrading the user experience during
evaluation. Conversely, current multileaved comparison methods that maintain
the user experience cannot guarantee correctness. Our contribution is two-fold.
First, we propose a theoretical framework for systematically comparing
multileaved comparison methods using the notions of considerateness, which
concerns maintaining the user experience, and fidelity, which concerns reliable
correct outcomes. Second, we introduce a novel multileaved comparison method,
Pairwise Preference Multileaving (PPM), that performs comparisons based on
document-pair preferences, and prove that it is considerate and has fidelity.
We show empirically that, compared to previous multileaved comparison methods,
PPM is more sensitive to user preferences and scalable with the number of
rankers being compared.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information
and Knowledge Managemen
Local Algorithms for Block Models with Side Information
There has been a recent interest in understanding the power of local
algorithms for optimization and inference problems on sparse graphs. Gamarnik
and Sudan (2014) showed that local algorithms are weaker than global algorithms
for finding large independent sets in sparse random regular graphs. Montanari
(2015) showed that local algorithms are suboptimal for finding a community with
high connectivity in the sparse Erd\H{o}s-R\'enyi random graphs. For the
symmetric planted partition problem (also named community detection for the
block models) on sparse graphs, a simple observation is that local algorithms
cannot have non-trivial performance.
In this work we consider the effect of side information on local algorithms
for community detection under the binary symmetric stochastic block model. In
the block model with side information each of the vertices is labeled
or independently and uniformly at random; each pair of vertices is
connected independently with probability if both of them have the same
label or otherwise. The goal is to estimate the underlying vertex
labeling given 1) the graph structure and 2) side information in the form of a
vertex labeling positively correlated with the true one. Assuming that the
ratio between in and out degree is and the average degree , we characterize three different regimes under which a
local algorithm, namely, belief propagation run on the local neighborhoods,
maximizes the expected fraction of vertices labeled correctly. Thus, in
contrast to the case of symmetric block models without side information, we
show that local algorithms can achieve optimal performance for the block model
with side information.Comment: Due to the limitation "The abstract field cannot be longer than 1,920
characters", the abstract here is shorter than that in the PDF fil
Backbone of complex networks of corporations: The flow of control
We present a methodology to extract the backbone of complex networks based on
the weight and direction of links, as well as on nontopological properties of
nodes. We show how the methodology can be applied in general to networks in
which mass or energy is flowing along the links. In particular, the procedure
enables us to address important questions in economics, namely, how control and
wealth are structured and concentrated across national markets. We report on
the first cross-country investigation of ownership networks, focusing on the
stock markets of 48 countries around the world. On the one hand, our analysis
confirms results expected on the basis of the literature on corporate control,
namely, that in Anglo-Saxon countries control tends to be dispersed among
numerous shareholders. On the other hand, it also reveals that in the same
countries, control is found to be highly concentrated at the global level,
namely, lying in the hands of very few important shareholders. Interestingly,
the exact opposite is observed for European countries. These results have
previously not been reported as they are not observable without the kind of
network analysis developed here.Comment: 24 pages, 12 figures, 2nd version (text made more concise and
readable, results unchanged
Kernel Dependency Estimation
We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. Output kernels also make it possible to encode prior information and/or invariances in the loss function in an elegant way. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images
Assessment on experimental bacterial biofilms and in clinical practice of the efficacy of sampling solutions for microbiological testing of endoscopes
International audienceOpinions differ on the value of microbiological testing of endoscopes, which varies according to the technique used. We compared the efficacy on bacterial biofilms of sampling solutions used for the surveillance of the contamination of endoscope channels. To compare efficacy, we used an experimental model of a 48-h Pseudomonas biofilm grown on endoscope internal tubing. Sampling of this experimental biofilm was performed with a Tween 80-lecithin-based solution, saline, and sterile water. We also performed a randomized prospective study during routine clinical practice in our hospital sampling randomly with two different solutions the endoscopes after reprocessing. Biofilm recovery expressed as a logarithmic ratio of bacteria recovered on bacteria initially present in biofilm was significantly more effective with the Tween 80-lecithin-based solution than with saline solution (P = 0.002) and sterile water (P = 0.002). There was no significant difference between saline and sterile water. In the randomized clinical study, the rates of endoscopes that were contaminated with the Tween 80-lecithin-based sampling solution and the saline were 8/25 and 1/25, respectively (P = 0.02), and the mean numbers of bacteria recovered were 281 and 19 CFU/100 ml (P = 0.001), respectively. In conclusion, the efficiency and therefore the value of the monitoring of endoscope reprocessing by microbiological cultures is dependent on the sampling solutions used. A sampling solution with a tensioactive action is more efficient than saline in detecting biofilm contamination of endoscopes
Confidential Boosting with Random Linear Classifiers for Outsourced User-generated Data
User-generated data is crucial to predictive modeling in many applications.
With a web/mobile/wearable interface, a data owner can continuously record data
generated by distributed users and build various predictive models from the
data to improve their operations, services, and revenue. Due to the large size
and evolving nature of users data, data owners may rely on public cloud service
providers (Cloud) for storage and computation scalability. Exposing sensitive
user-generated data and advanced analytic models to Cloud raises privacy
concerns. We present a confidential learning framework, SecureBoost, for data
owners that want to learn predictive models from aggregated user-generated data
but offload the storage and computational burden to Cloud without having to
worry about protecting the sensitive data. SecureBoost allows users to submit
encrypted or randomly masked data to designated Cloud directly. Our framework
utilizes random linear classifiers (RLCs) as the base classifiers in the
boosting framework to dramatically simplify the design of the proposed
confidential boosting protocols, yet still preserve the model quality. A
Cryptographic Service Provider (CSP) is used to assist the Cloud's processing,
reducing the complexity of the protocol constructions. We present two
constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of
homomorphic encryption, garbled circuits, and random masking to achieve both
security and efficiency. For a boosted model, Cloud learns only the RLCs and
the CSP learns only the weights of the RLCs. Finally, the data owner collects
the two parts to get the complete model. We conduct extensive experiments to
understand the quality of the RLC-based boosting and the cost distribution of
the constructions. Our results show that SecureBoost can efficiently learn
high-quality boosting models from protected user-generated data
Implicitly Constrained Semi-Supervised Least Squares Classification
We introduce a novel semi-supervised version of the least squares classifier.
This implicitly constrained least squares (ICLS) classifier minimizes the
squared loss on the labeled data among the set of parameters implied by all
possible labelings of the unlabeled data. Unlike other discriminative
semi-supervised methods, our approach does not introduce explicit additional
assumptions into the objective function, but leverages implicit assumptions
already present in the choice of the supervised least squares classifier. We
show this approach can be formulated as a quadratic programming problem and its
solution can be found using a simple gradient descent procedure. We prove that,
in a certain way, our method never leads to performance worse than the
supervised classifier. Experimental results corroborate this theoretical result
in the multidimensional case on benchmark datasets, also in terms of the error
rate.Comment: 12 pages, 2 figures, 1 table. The Fourteenth International Symposium
on Intelligent Data Analysis (2015), Saint-Etienne, Franc
- …